Search | WHO COVID-19 Research Database

DETERMINING THE BEST ACOUSTIC FEATURES FOR SMOKER IDENTIFICATION

Ma, Z.; Qiu, Y.; Hou, F.; Wang, R.; Chu, J. T. W.; Bullen, C..

47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 ; 2022-May:8177-8181, 2022.

Article in English | Scopus | ID: covidwho-1948777

ABSTRACT

Speech-based automatic smoker identification (also known as smoker/non-smoker classification) aims to identify speakers' smoking status from their speech. In the COVID-19 pandemic, speech-based automatic smoker identification approaches have received more attention in smoking cessation research due to low cost and contactless sample collection. This study focuses on determining the best acoustic features for smoker identification. In this paper, we investigate the performance of four acoustic feature sets/representations extracted using three feature extraction/learning approaches: (i) hand-crafted feature sets including the extended Geneva Minimalistic Acoustic Parameter Set and the Computational Paralinguistics Challenge Set, (ii) the Bag-of-Audio-Words representations, (iii) the neural representations extracted from raw waveform signals by SincNet. Experimental results show that: (i) SincNet feature representations are the most effective for smoker identification and outperform the MFCC baseline features by 16% in absolute accuracy;(ii) the performance of hand-crafted feature sets and the Bag-of-Audio-Words representations rely on the scale of the dimensions of feature vectors. © 2022 IEEE

MODELING OF PRE-TRAINED NEURAL NETWORK EMBEDDINGS LEARNED FROM RAW WAVEFORM FOR COVID-19 INFECTION DETECTION

Mostaani, Z.; Prasad, R.; Vlasenko, B.; Magimai-Doss, M..

47th IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2022 ; 2022-May:8482-8486, 2022.

Article in English | Scopus | ID: covidwho-1891390

ABSTRACT

COVID-19 is a respiratory system disorder that can disrupt the function of lungs. Effects of dysfunctional respiratory mechanism can reflect upon other modalities which function in close coupling. Audio signals result from modulation of respiration through speech production system, and hence acoustic information can be modeled for detection of COVID-19. In that direction, this paper is addressing the second DiCOVA challenge that deals with COVID-19 detection based on speech, cough and breathing. We investigate modeling of (a) ComParE LLD representations derived at frame- and turn-level resolutions and (b) neural representations obtained from pre-trained neural networks trained to recognize phones and estimate breathing patterns. On Track 1, the ComParE LLD representations yield a best performance of 78.05% area under the curve (AUC). Experimental studies on Track 2 and Track 3 demonstrate that neural representations tend to yield better detection than ComParE LLD representations. Late fusion of different utterance level representations of neural embeddings yielded a best performance of 80.64% AUC. © 2022 IEEE

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL